source("./Mean Reversion/RMR.001 Load Packages.R") pricing_data <- read_csv("./Mean Reversion/Raw Data/pricing data clean.csv") ## Parsed with column specification:
## cols(
## date_unix = col_integer(),
## date_time = col_datetime(format = ""),
## high = col_double(),
## low = col_double(),
## open = col_double(),
## close = col_double(),
## volume = col_double(),
## quote_volume = col_double(),
## weighted_average = col_double(),
## currency_pair = col_character(),
## period = col_integer()
## )
Description
Spreads Poloneix pricing data into wide format and filters data to a specified time resolution and time window.
Arguments
df: A dataframe containing pricing data from Poloneix gathered in tidy format.
time_resolution: The number of seconds that each observation spans. Takes values 300, 900, 1800, 7200, 14400, and 86400.
start_date: The start date of the time window.
end_date: The end date of the time window.
prepare_data <- function(df, time_resolution, start_date, end_date) {
df <- df %>%
filter(period == time_resolution,
date_time >= start_date,
date_time <= end_date) %>%
select(date_unix, date_time, close, currency_pair) %>%
spread(currency_pair, close)
return(df)
} Description
The Engle-Granger method is used to test for cointegration. This method is comprised of two steps: (1) Perform a linear regression of log(coin_y) on log(coin_x). (2) Perform an Augmented Dickey-Fuller test on the residuals from the linear regression estimated in (1). The ADF test specification is of a non-zero mean, no time-based trend, and one autoregressive lag. The function returns the ADF test statistic.
Arguments
coin_y: A vector containing the pricing data for the dependent coin in the regression.
coin_x: A vector containing the pricing data for the independent coin in the regression.
test_cointegration <- function(coin_y, coin_x) {
lm_model <- lm(log(coin_y) ~ log(coin_x))
residuals <- lm_model[["residuals"]]
adf_test <- ur.df(residuals, type = "drift", lags = 1)
df_stat = adf_test@testreg[["coefficients"]][2, 3]
return(df_stat)
} Description
Two sets of currency pairs are examined: currency pairs where USDT is the quote currency and currency pairs where BTC is the quote currency. All combinations of coins within each set are created. Combinations that consist of the coin with itself are removed. The function returns a dataframe containing the coin pairs.
create_pairs <- function() {
coins_usdt <- c("USDT_BTC", "USDT_DASH", "USDT_ETH", "USDT_LTC", "USDT_REP", "USDT_XEM", "USDT_XMR", "USDT_ZEC")
coins_btc <- c("BTC_DASH", "BTC_ETH", "BTC_LTC", "BTC_REP", "BTC_XEM", "BTC_XMR", "BTC_ZEC")
coin_pairs <- rbind(expand.grid(coins_usdt, coins_usdt), expand.grid(coins_btc, coins_btc)) %>%
rename(coin_y = Var1, coin_x = Var2) %>%
filter(coin_y != coin_x) %>%
mutate_if(is.factor, as.character) %>%
as_tibble()
return(coin_pairs)
} Description
Test for cointegration between each coin pair generated by the create_pairs() function. The test for cointegration is performed by the test_cointegration() function. The function returns a dataframe containing the coin pairs and the ADF test statistic resulting from testing cointegration between each coin pair.
Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
coin_pairs: A dataframe generated by create_pairs().
test_pairs <- function(train, coin_pairs) {
adf_stat <- c()
for (n in 1:nrow(coin_pairs)) {
coin_y <- coin_pairs[[n, "coin_y"]]
coin_x <- coin_pairs[[n, "coin_x"]]
cointegration_results<- test_cointegration(train[[coin_y]], train[[coin_x]])
adf_stat <- c(adf_stat, cointegration_results)
}
df <- coin_pairs %>%
mutate(adf_stat = adf_stat) %>%
arrange(adf_stat)
return(df)
} Description
Select cointegrated coin pairs to be used in a mean reversion strategy. The current coin selection logic is to select all coins where the ADF test statistic is less than -2.57.
Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
coin_pairs: A dataframe generated by create_pairs().
select_pairs <- function(train, coin_pairs) {
df <- test_pairs(train = train, coin_pairs = coin_pairs) %>%
filter(adf_stat <= -2.57)
return(df)
} Description
Generate trading signals that indicate the current position in the spread formed by a linear combination of coin y and coin x. A signal of +1 indicates a long position in the spread, 0 indicates a flat position, and -1 indicates a short position in the spread. Signals are generated for the test set using a model trained on the training set.
The current trading logic is perform a linear regression of log(coin y) on log(coin x) using the training set. A spread is then calculated in the test set using the fitted hedge ratio and intercept from the regression. The z-score of the spread is then calculated using the mean and standard deviation of the training set. A long position is entered when the z-score falls below -2 and the position is closed when the z-score returns to 0. A short position is entered when the z-score rises above +2 and the position is closed when the z-score returns to 0.
Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.
generate_signals <- function(train, test, coin_y, coin_x, threshold_z) {
model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))
intercept <- coef(model)[1]
hedge_ratio <- coef(model)[2]
df_signals <- test %>%
mutate(spread = log(test[[coin_y]]) - log(test[[coin_x]]) * hedge_ratio - intercept,
spread_z = (spread - mean(model[["residuals"]])) / sd(model[["residuals"]]),
signal_long = ifelse(lag(spread_z, 1) <= -threshold_z, 1, NA),
signal_long = ifelse(lag(spread_z, 1) >= 0, 0, signal_long),
signal_long = na.locf(signal_long, na.rm = FALSE),
signal_short = ifelse(lag(spread_z, 1) >= threshold_z, -1, NA),
signal_short = ifelse(lag(spread_z, 1) <= 0, 0, signal_short),
signal_short = na.locf(signal_short, na.rm = FALSE),
signal = signal_long + signal_short,
signal = ifelse(is.na(signal), 0, signal))
return(df_signals[["signal"]])
} Description
Calculate the return of a cointegration-based mean reversion trading strategy using coin y and coin x.
The current backtesting logic uses signals generated by generate_signals(). The coin_y_return and coin_x_return indicate the one period percentage return of each coin. The coin_y_position and coin_x_position indicate the market value in USD in each coin. coin_y_pnl and coin_x_pnl indicate the USD value of the profit and loss for each coin. The combined_position indicates the gross market value of the combined positions.
Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.
backtest_pair <- function(train, test, coin_y, coin_x, threshold_z) {
model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))
intercept <- coef(model)[1]
hedge_ratio <- coef(model)[2]
df_backtest <- test %>%
mutate(signal = generate_signals(train, test, coin_y, coin_x, threshold_z),
coin_y_return = test[[coin_y]] / lag(test[[coin_y]]) - 1,
coin_x_return = test[[coin_x]] / lag(test[[coin_x]]) - 1,
coin_y_position = test[[coin_y]] * signal * 1,
coin_x_position = test[[coin_x]] * signal * hedge_ratio * -1,
coin_y_pnl = lag(coin_y_position, 1) * coin_y_return,
coin_x_pnl = lag(coin_x_position, 1) * coin_x_return,
combined_position = abs(coin_y_position) + abs(coin_x_position),
combined_pnl = coin_y_pnl + coin_x_pnl,
combined_return = combined_pnl / lag(combined_position, 1)) %>%
mutate_all(funs(ifelse(is.na(.), 0, .))) %>%
mutate(date_time = as.POSIXct(date_unix, origin = "1970-01-01"),
return_pair = cumprod(1 + combined_return))
return(df_backtest[["return_pair"]])
} Description
Calculate the return of a cointegration-based mean reversion trading strategy using an equally weighted portfolio of cointegrated coin pairs.
Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
cointegrated_coins: A dataframe generated by select_coins() that represents a set of cointegrated coin pairs.
backtest_strategy <- function(train, test, cointegrated_coins, threshold_z) {
df <- tibble()
for (i in 1:nrow(cointegrated_coins)) {
single_pair <- tibble(return_pair = backtest_pair(train,
test,
cointegrated_coins[["coin_y"]][i],
cointegrated_coins[["coin_x"]][i],
threshold_z),
coin_y = cointegrated_coins[["coin_y"]][i],
coin_x = cointegrated_coins[["coin_x"]][i],
date_time = test[["date_time"]])
df <- bind_rows(df, single_pair)
}
df <- df %>%
group_by(date_time) %>%
summarise(return_strategy = mean(return_pair))
return(df[["return_strategy"]])
} Description
Create plots of a cointegration-based mean reversion trading strategy using coin y and coin x. There are two plots created by this function. The first plot displays the spread transformed into z-score with three red lines at -2, 0, and 2. A green line indicates the signal which can take values -1, 0, and +1. The second plot displays the cumulative return of the model in blue. Two additional lines show the buy and hold return of coin y and coin x as red and blue lines, respectively.
Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.
plot_backtest <- function(train, test, coin_y, coin_x, threshold_z) {
model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))
intercept <- coef(model)[1]
hedge_ratio <- coef(model)[2]
df_plot <- test %>%
mutate(spread = log(test[[coin_y]]) - log(test[[coin_x]]) * hedge_ratio - intercept,
spread_z = (spread - mean(model[["residuals"]])) / sd(model[["residuals"]]),
signal = generate_signals(train, test, coin_y, coin_x, threshold_z),
return_pair = backtest_pair(train, test, coin_y, coin_x, threshold_z),
return_buyhold_y = test[[coin_y]] / test[[coin_y]][1],
return_buyhold_x = test[[coin_x]] / test[[coin_x]][1])
print(ggplot(df_plot, aes(x = date_time)) +
geom_line(aes(y = spread_z, colour = "Spread Z"), size = 1) +
geom_line(aes(y = signal, colour = "Signal"), size = 0.5) +
geom_hline(yintercept = 0, colour = "red", alpha = 0.5) +
geom_hline(yintercept = 2, colour = "red", alpha = 0.5) +
geom_hline(yintercept = -2, colour = "red", alpha = 0.5) +
scale_color_manual(name = "Series", values = c("Spread Z" = "blue", "Signal" = "green")) +
labs(title = "Spread vs Trading Signal", subtitle = str_c(coin_y, " and ", coin_x), x = "Date", y = "Spread and Signal"))
print(ggplot(df_plot, aes(x = date_time)) +
geom_line(aes(y = return_pair, colour = "Model"), size = 1) +
geom_line(aes(y = return_buyhold_y, colour = "Coin Y"), size = 0.5, alpha = 0.4) +
geom_line(aes(y = return_buyhold_x, colour = "Coin X"), size = 0.5, alpha = 0.4) +
geom_hline(yintercept = 1, colour = "black") +
scale_color_manual(name = "Return", values = c("Model" = "darkblue", "Coin Y" = "darkred", "Coin X" = "darkgreen")) +
labs(title = "Model Return vs Buy Hold Return", subtitle = str_c(coin_y, " and ", coin_x), x = "Date", y = "Cumulative Return"))
} This section displays results from individual cointegrated coin pairs selected from a training set comprised of data
from July 1, 2017 to September 1, 2017. The model is fit using the training set and evaluated on a test set comprised data from September 1, 2017 to September 30, 2017.
train <- prepare_data(df = pricing_data, time_resolution = 7200, start_date = "2017-07-01", end_date = "2017-09-01")
test <- prepare_data(df = pricing_data, time_resolution = 7200, start_date = "2017-09-01", end_date = "2017-09-30")
cointegrated_coins <- select_pairs(train = train, coin_pairs = create_pairs())
print(cointegrated_coins)## # A tibble: 27 x 3
## coin_y coin_x adf_stat
## <chr> <chr> <dbl>
## 1 USDT_REP USDT_ZEC -4.815791
## 2 BTC_REP BTC_ZEC -4.804029
## 3 BTC_ZEC BTC_REP -4.803138
## 4 USDT_ZEC USDT_REP -4.726716
## 5 USDT_DASH USDT_XMR -3.679857
## 6 USDT_ZEC USDT_LTC -3.562638
## 7 USDT_XMR USDT_DASH -3.527433
## 8 USDT_REP USDT_LTC -3.498944
## 9 BTC_ZEC BTC_LTC -3.474285
## 10 BTC_REP BTC_LTC -3.449700
## # ... with 17 more rows
for (i in 1:10) {
plot_backtest(train, test, cointegrated_coins[["coin_y"]][i], cointegrated_coins[["coin_x"]][i], 2)
} This section displays results from individual cointegrated coin pairs selected from a training set comprised of data
from May 1, 2017 to July 1, 2017. The model is fit using the training set and evaluated on a test set comprised data from July 1, 2017 to July 31, 2017.
train <- prepare_data(df = pricing_data, time_resolution = 7200, start_date = "2017-05-01", end_date = "2017-07-01")
test <- prepare_data(df = pricing_data, time_resolution = 7200, start_date = "2017-07-01", end_date = "2017-07-31")
cointegrated_coins <- select_pairs(train = train, coin_pairs = create_pairs())
print(cointegrated_coins)## # A tibble: 56 x 3
## coin_y coin_x adf_stat
## <chr> <chr> <dbl>
## 1 BTC_XMR BTC_REP -5.070502
## 2 USDT_XMR USDT_REP -4.958723
## 3 USDT_REP USDT_XMR -4.810539
## 4 USDT_XMR USDT_BTC -4.809390
## 5 USDT_BTC USDT_XMR -4.777292
## 6 BTC_REP BTC_XMR -4.760482
## 7 BTC_XMR BTC_ETH -4.758234
## 8 BTC_XMR BTC_ZEC -4.695162
## 9 BTC_ZEC BTC_ETH -4.310979
## 10 BTC_ETH BTC_ZEC -4.175931
## # ... with 46 more rows
for (i in 1:10) {
plot_backtest(train, test, cointegrated_coins[["coin_y"]][i], cointegrated_coins[["coin_x"]][i], 2)
} This section displays results from individual cointegrated coin pairs selected from a training set comprised of data
from May 1, 2017 to July 1, 2017. The model is fit using the training set and evaluated on a test set comprised data from July 1, 2017 to July 31, 2017.
train <- prepare_data(df = pricing_data, time_resolution = 7200, start_date = "2017-02-01", end_date = "2017-04-01")
test <- prepare_data(df = pricing_data, time_resolution = 7200, start_date = "2017-04-01", end_date = "2017-04-30")
cointegrated_coins <- select_pairs(train = train, coin_pairs = create_pairs())
print(cointegrated_coins)## # A tibble: 28 x 3
## coin_y coin_x adf_stat
## <chr> <chr> <dbl>
## 1 USDT_XEM USDT_ZEC -3.839360
## 2 BTC_ZEC BTC_XEM -3.780460
## 3 BTC_ZEC BTC_XMR -3.754457
## 4 BTC_XEM BTC_ZEC -3.739462
## 5 BTC_XMR BTC_ZEC -3.733407
## 6 USDT_ZEC USDT_XEM -3.719327
## 7 USDT_XMR USDT_ZEC -3.668113
## 8 USDT_ZEC USDT_XMR -3.641600
## 9 USDT_XEM USDT_XMR -3.269134
## 10 USDT_XMR USDT_XEM -3.199054
## # ... with 18 more rows
for (i in 1:10) {
plot_backtest(train, test, cointegrated_coins[["coin_y"]][i], cointegrated_coins[["coin_x"]][i], 2)
} This section displays results from individual cointegrated coin pairs selected from a training set comprised of data
from May 1, 2017 to July 1, 2017. The model is fit using the training set and evaluated on a test set comprised data from July 1, 2017 to July 31, 2017.
train <- prepare_data(df = pricing_data, time_resolution = 7200, start_date = "2017-01-01", end_date = "2017-03-01")
test <- prepare_data(df = pricing_data, time_resolution = 7200, start_date = "2017-03-01", end_date = "2017-03-31")
cointegrated_coins <- select_pairs(train = train, coin_pairs = create_pairs())
print(cointegrated_coins)## # A tibble: 29 x 3
## coin_y coin_x adf_stat
## <chr> <chr> <dbl>
## 1 USDT_ETH USDT_DASH -3.756716
## 2 USDT_REP USDT_ZEC -3.675519
## 3 USDT_REP USDT_BTC -3.628942
## 4 USDT_LTC USDT_XMR -3.574581
## 5 USDT_REP USDT_ETH -3.446473
## 6 USDT_REP USDT_LTC -3.418863
## 7 USDT_REP USDT_DASH -3.388481
## 8 USDT_REP USDT_XEM -3.387934
## 9 USDT_REP USDT_XMR -3.362505
## 10 USDT_DASH USDT_ETH -3.258929
## # ... with 19 more rows
for (i in 1:10) {
plot_backtest(train, test, cointegrated_coins[["coin_y"]][i], cointegrated_coins[["coin_x"]][i], 2)
} This section displays results from a pairs trading strategy using an equally-weighted portfolio of cointegrated coins.
train <- prepare_data(df = pricing_data, time_resolution = 7200, start_date = "2017-07-01", end_date = "2017-09-01")
test <- prepare_data(df = pricing_data, time_resolution = 7200, start_date = "2017-09-01", end_date = "2017-09-30")
test <- test %>% mutate(return_strategy = backtest_strategy(train, test, select_pairs(train, create_pairs()), 2))
ggplot(test, aes(x = date_time)) +
geom_line(aes(y = return_strategy, colour = "Strategy"), size = 1) +
geom_line(aes(y = USDT_BTC / USDT_BTC[1], colour = "USDT_BTC"), size = 0.5, alpha = 0.4) +
geom_hline(yintercept = 1, colour = "black") +
scale_color_manual(name = "Return", values = c("Strategy" = "darkblue", "USDT_BTC" = "darkred")) +
labs(title = "Strategy Return vs Buy Hold Return", x = "Date", y = "Cumulative Return") This section displays results fro ma pairs trading strategy using an equally-weighted portfolio of cointegrated coins where new coins are selected each month. The cross validation is run iteratively through time using a test set comprised of data from January 1, 2017 to October 1, 2017.
test_dates <- (c("2017-01-01", "2017-02-01", "2017-03-01", "2017-04-01", "2017-05-01", "2017-06-01",
"2017-07-01", "2017-08-01", "2017-09-01"))
results <- tibble()
for (test_date in test_dates) {
test_date <- as.Date(test_date)
print(str_c("Cross validating strategy. using train set from ", test_date - months(2) , " to ", test_date,
" and test set from ", test_date, " to ", test_date + months(1), "."))
train <- prepare_data(df = pricing_data, time_resolution = 7200, start_date = test_date - months(2), end_date = test_date)
test <- prepare_data(df = pricing_data, time_resolution = 7200, start_date = test_date, end_date = test_date + months(1))
test <- test %>%
mutate(return_strategy_200 = backtest_strategy(train, test, select_pairs(train, create_pairs()), 2.00),
return_strategy_change_200 = return_strategy_200 / lag(return_strategy_200, 1) - 1) %>%
mutate_all(funs(ifelse(is.na(.), 0, .)))
results <- bind_rows(results, test)
} ## [1] "Cross validating strategy. using train set from 2016-11-01 to 2017-01-01 and test set from 2017-01-01 to 2017-02-01."
## [1] "Cross validating strategy. using train set from 2016-12-01 to 2017-02-01 and test set from 2017-02-01 to 2017-03-01."
## [1] "Cross validating strategy. using train set from 2017-01-01 to 2017-03-01 and test set from 2017-03-01 to 2017-04-01."
## [1] "Cross validating strategy. using train set from 2017-02-01 to 2017-04-01 and test set from 2017-04-01 to 2017-05-01."
## [1] "Cross validating strategy. using train set from 2017-03-01 to 2017-05-01 and test set from 2017-05-01 to 2017-06-01."
## [1] "Cross validating strategy. using train set from 2017-04-01 to 2017-06-01 and test set from 2017-06-01 to 2017-07-01."
## [1] "Cross validating strategy. using train set from 2017-05-01 to 2017-07-01 and test set from 2017-07-01 to 2017-08-01."
## [1] "Cross validating strategy. using train set from 2017-06-01 to 2017-08-01 and test set from 2017-08-01 to 2017-09-01."
## [1] "Cross validating strategy. using train set from 2017-07-01 to 2017-09-01 and test set from 2017-09-01 to 2017-10-01."
results <- results %>%
mutate(return_strategy_cumulative_200 = cumprod(1 + return_strategy_change_200),
date_time = as.POSIXct(date_time, origin = "1970-01-01"))
ggplot(results, aes(x = date_time)) +
geom_line(aes(y = return_strategy_cumulative_200, colour = "2.00"), size = 1) +
geom_hline(yintercept = 1, colour = "black") +
labs(title = "Strategy Return vs Buy Hold Return", x = "Date", y = "Cumulative Return")